
<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />

    <title>pandas数据处理与分析 &#8212; Joyful Pandas 1.0 documentation</title>
<script>
  document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
  document.documentElement.dataset.theme = localStorage.getItem("theme") || "light"
</script>

  <!-- Loaded before other Sphinx assets -->
  <link href="_static/styles/theme.css?digest=92025949c220c2e29695" rel="stylesheet">
<link href="_static/styles/pydata-sphinx-theme.css?digest=92025949c220c2e29695" rel="stylesheet">


  <link rel="stylesheet"
    href="_static/vendor/fontawesome/5.13.0/css/all.min.css">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
  <link rel="preload" as="font" type="font/woff2" crossorigin
    href="_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">

    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="_static/plot_directive.css" />
    <link rel="stylesheet" type="text/css" href="_static/css/s4defs-roles.css" />

  <!-- Pre-loaded scripts that we'll load fully later -->
  <link rel="preload" as="script" href="_static/scripts/pydata-sphinx-theme.js?digest=92025949c220c2e29695">

    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
    <script src="_static/jquery.js"></script>
    <script src="_static/underscore.js"></script>
    <script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="_static/doctools.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="补充习题" href="%E8%A1%A5%E5%85%85%E4%B9%A0%E9%A2%98.html" />
    <link rel="prev" title="Datawhale" href="Datawhale.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
  </head>
  
  
  <body data-spy="scroll" data-target="#bd-toc-nav" data-offset="180" data-default-mode="">
    <div class="bd-header-announcement container-fluid" id="banner">
      

    </div>

    
    <nav class="bd-header navbar navbar-light navbar-expand-lg bg-light fixed-top bd-navbar" id="navbar-main"><div class="bd-header__inner container-xl">

  <div id="navbar-start">
    
    
  


<a class="navbar-brand logo" href="index.html">
  
  
  
  
    <img src="_static/finallogo1.svg" class="logo__image only-light" alt="Logo image">
    <img src="_static/finallogo1.svg" class="logo__image only-dark" alt="Logo image">
  
  
</a>
    
  </div>

  <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-collapsible" aria-controls="navbar-collapsible" aria-expanded="false" aria-label="Toggle navigation">
    <span class="fas fa-bars"></span>
  </button>

  
  <div id="navbar-collapsible" class="col-lg-9 collapse navbar-collapse">
    <div id="navbar-center" class="mr-auto">
      
      <div class="navbar-center-item">
        <ul id="navbar-main-elements" class="navbar-nav">
    <li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="Home.html">
  Home
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="Content/index.html">
  Content
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="Author.html">
  Author
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="Datawhale.html">
  Datawhale
 </a>
</li>

<li class="toctree-l1 current active nav-item">
 <a class="current reference internal nav-link" href="#">
  pandas数据处理与分析
 </a>
</li>

<li class="toctree-l1 nav-item">
 <a class="reference internal nav-link" href="%E8%A1%A5%E5%85%85%E4%B9%A0%E9%A2%98.html">
  补充习题
 </a>
</li>

    
    <li class="nav-item">
        <a class="nav-link nav-external" href="https://pandas.pydata.org/docs/index.html">Doc<i class="fas fa-external-link-alt"></i></a>
    </li>
    
</ul>
      </div>
      
    </div>

    <div id="navbar-end">
      
      <div class="navbar-end-item">
        <span id="theme-switch" class="btn btn-sm btn-outline-primary navbar-btn rounded-circle">
    <a class="theme-switch" data-mode="light"><i class="fas fa-sun"></i></a>
    <a class="theme-switch" data-mode="dark"><i class="far fa-moon"></i></a>
    <a class="theme-switch" data-mode="auto"><i class="fas fa-adjust"></i></a>
</span>
      </div>
      
      <div class="navbar-end-item">
        <ul id="navbar-icon-links" class="navbar-nav" aria-label="Icon Links">
        <li class="nav-item">
          <a class="nav-link" href="https://github.com/datawhalechina/joyful-pandas" rel="noopener" target="_blank" title="GitHub"><span><i class="fab fa-github-square"></i></span>
            <label class="sr-only">GitHub</label></a>
        </li>
      </ul>
      </div>
      
    </div>
  </div>
</div>
    </nav>
    

    <div class="bd-container container-xl">
      <div class="bd-container__inner row">
          

<!-- Only show if we have sidebars configured, else just a small margin  -->
<div class="bd-sidebar-primary col-12 col-md-3 bd-sidebar">
  <div class="sidebar-start-items"><form class="bd-search d-flex align-items-center" action="search.html" method="get">
  <i class="icon fas fa-search"></i>
  <input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form><nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
  <div class="bd-toc-item active">
    
  </div>
</nav>
  </div>
  <div class="sidebar-end-items">
  </div>
</div>


          


<div class="bd-sidebar-secondary d-none d-xl-block col-xl-2 bd-toc">
  
    
    <div class="toc-item">
      
<div class="tocsection onthispage mt-5 pt-1 pb-3">
    <i class="fas fa-list"></i> On this page
</div>

<nav id="bd-toc-nav">
    <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id1">
   购买链接
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id2">
   配套资源
  </a>
 </li>
 <li class="toc-h2 nav-item toc-entry">
  <a class="reference internal nav-link" href="#id3">
   勘误
  </a>
  <ul class="nav section-nav flex-column">
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id4">
     第1版第3次印刷
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id5">
     第1版第2次印刷
    </a>
   </li>
   <li class="toc-h3 nav-item toc-entry">
    <a class="reference internal nav-link" href="#id6">
     第1版第1次印刷
    </a>
   </li>
  </ul>
 </li>
</ul>

</nav>
    </div>
    
    <div class="toc-item">
      
    </div>
    
  
</div>


          
          
          <div class="bd-content col-12 col-md-9 col-xl-7">
              
              <article class="bd-article" role="main">
                
  <section id="pandas">
<h1>pandas数据处理与分析<a class="headerlink" href="#pandas" title="Permalink to this heading">#</a></h1>
<p>本书共有13章。前十章来自于Joyful Pandas教程，对正文、练一练和习题部分做了较多修订，在维持原有章节目录结构的前提下，各章在不同程度上对教程细节做出优化。在此基础上，本书新增了数据观测、特征工程以及性能优化的三个章节，数据观测能够帮读者较为全面地掌握可视化方法以及数据集观测的各类思路，特征工程一章阐述了各种特征构造以及特征选择的方法，这些内容能够在实际结构化数据的数据处理/科研/竞赛任务中被广泛运用，性能优化部分包含了我们应当如何编写高效的pandas代码以及如何运用多进程、Cython和Numba在最大程度上优化代码性能的相关内容。具体的练一练/习题变化以及新增章节情况可见 <a class="reference external" href="https://github.com/datawhalechina/joyful-pandas">GitHub仓库</a> 。本书在写作期间，Joyful Pandas在 <a class="reference external" href="https://pandas.pydata.org/docs/dev/getting_started/tutorials.html#joyful-pandas">pandas官网</a> 上被列为pandas的中文推荐教程，在此也对pandas核心开发组多年来的长期维护和社区建设表示感谢！</p>
<a class="reference internal image-reference" href="_images/pandas封面.jpg"><img alt="_images/pandas封面.jpg" class="align-right" src="_images/pandas封面.jpg" style="height: 300px;" /></a>
<section id="id1">
<h2>购买链接<a class="headerlink" href="#id1" title="Permalink to this heading">#</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="http://product.dangdang.com/29434656.html">当当</a></p></li>
<li><p><a class="reference external" href="https://item.jd.com/13268767.html">京东</a></p></li>
</ul>
</section>
<section id="id2">
<h2>配套资源<a class="headerlink" href="#id2" title="Permalink to this heading">#</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="https://pan.baidu.com/s/16fgy9qYXo0JOsz3GIXQeKA">数据集</a> （提取码：9e8r）</p></li>
<li><p><a class="reference external" href="https://gyhhaha.github.io/pd-book/">参考答案</a></p></li>
</ul>
</section>
<section id="id3">
<h2>勘误<a class="headerlink" href="#id3" title="Permalink to this heading">#</a></h2>
<section id="id4">
<h3>第1版第3次印刷<a class="headerlink" href="#id4" title="Permalink to this heading">#</a></h3>
<ul class="simple">
<li><p>作者简介，2022年12月毕业，硕士在读改为硕士</p></li>
<li><p>第40页，Out[78]结果的第一行或第二行删去，索引为0的行重复了</p></li>
<li><p>第62页，Out[53]第3行内层行索引的第一个元素应当为”Junior”</p></li>
<li><p>第133页，Out[32]输出应当为[‘nmm’, ‘nmm’]</p></li>
<li><p>第144页，Out[99]输出的三个元素左侧都漏了两个星号</p></li>
<li><p>第200页，In[26]第一行应当为(8, 6)不是(8.6)</p></li>
<li><p>第209页，“图11.29中可视化的效果非常糟糕”，应为图11.32</p></li>
</ul>
</section>
<section id="id5">
<h3>第1版第2次印刷<a class="headerlink" href="#id5" title="Permalink to this heading">#</a></h3>
<ul class="simple">
<li><p>第58页，第1行末尾多了一个or，删去</p></li>
<li><p>第94页，In[18]倒数第2行的sep取值应当为”_”而不是空格</p></li>
<li><p>第101页，第7行公式，集合的竖线后面是a∈A，A漏了</p></li>
<li><p>第106页，Out[20]第2行A列对应值应当为1</p></li>
<li><p>第122页，第1行Series拼写错误</p></li>
<li><p>第135页，练一练8-2代码第一行读取文本应当为”r”而不是”w”</p></li>
<li><p>第196页，图11.11部分单元格与输出不匹配，使用如下版本：</p></li>
</ul>
<a class="reference internal image-reference" href="_images/11-11.png"><img alt="_images/11-11.png" class="align-center" src="_images/11-11.png" style="width: 360.0px; height: 240.0px;" /></a>
</section>
<section id="id6">
<h3>第1版第1次印刷<a class="headerlink" href="#id6" title="Permalink to this heading">#</a></h3>
<ul class="simple">
<li><p>第15页，In[75]第1行应为a = np.array([[1, 2],[3, 4]])</p></li>
<li><p>第32页，In[35]第3行应为df[“col_3”] = [“apple”, “banana”, “cat”]</p></li>
<li><p>第47页，习题3第（1）题第2段的第2行开头应为 “其中 <span class="math notranslate nohighlight">\(w_0\)</span> 表示序列…”，删去“ <span class="math notranslate nohighlight">\(=0\)</span> ”</p></li>
<li><p>第64页，In[67]应为df_ex.loc[idx[“C”:, (“D”, “f”):]]</p></li>
<li><p>第70页，In[87]和Out[87]之间应存在分割代码块的空白</p></li>
<li><p>第98页，习题1的第1张表中First_Area一列应当所有元素全为字符“A”</p></li>
<li><p>第99页，习题2第3行改为“其中“日期”“统计类别”和“资源名称”3列已为…”</p></li>
<li><p>第104页，倒数第3行应当为“进行所有行的笛卡儿积”，不是“列”</p></li>
<li><p>第111页，文字段倒数第2行应当为“haversine_distances()”函数实现，漏了一个“s”</p></li>
<li><p>第115页，文字第2段第2行最后应当为“分别进行以上后2种情况的检索”</p></li>
<li><p>第119页，注解第3行最后应当为“而当选用spline的插值方法…”</p></li>
<li><p>第132页，Out[28]的Abc后面没有空格</p></li>
<li><p>第134页，In[41]第2行的banana前应当有一个空格</p></li>
<li><p>第185页，第二行应当为pip安装命令：pip install prophet</p></li>
<li><p>第210页，注解第1行应当为“pandas-profiling在3.1.x版本下…”</p></li>
<li><p>第223页，注解应当是“练一练”</p></li>
<li><p>第264页，In[14]倒数第2行应当缩进</p></li>
<li><p>第265页，In[16]的计时结果不算做Output，把Out[16]的记号删去</p></li>
<li><p>第265页，In[17]的计时结果应当和In[17]隔开</p></li>
</ul>
</section>
</section>
</section>


              </article>
              

              
          </div>
          
      </div>
    </div>

  
  
  <!-- Scripts loaded after <body> so the DOM is not blocked -->
  <script src="_static/scripts/pydata-sphinx-theme.js?digest=92025949c220c2e29695"></script>

<footer class="bd-footer"><div class="bd-footer__inner container">
  
  <div class="footer-item">
    <p class="copyright">
    &copy; Copyright 2020-2022, Datawhale, 耿远昊.<br>
</p>
  </div>
  
  <div class="footer-item">
    <p class="sphinx-version">
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 5.0.2.<br>
</p>
  </div>
  
</div>
</footer>
  </body>
</html>